A Language-Independent Feature Schema for Inflectional Morphology

نویسندگان

  • John Sylak-Glassman
  • Christo Kirov
  • David Yarowsky
  • Roger Que
چکیده

This paper presents a universal morphological feature schema that represents the finest distinctions in meaning that are expressed by overt, affixal inflectional morphology across languages. This schema is used to universalize data extracted from Wiktionary via a robust multidimensional table parsing algorithm and feature mapping algorithms, yielding 883,965 instantiated paradigms in 352 languages. These data are shown to be effective for training morphological analyzers, yielding significant accuracy gains when applied to Durrett and DeNero’s (2013) paradigm learning framework.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Universal Feature Schema for Rich Morphological Annotation and Fine-Grained Cross-Lingual Part-of-Speech Tagging

Semantically detailed and typologically-informed morphological analysis that is broadly applicable cross-linguistically has the potential to improve many NLP applications, including machine translation, n-gram language models, information extraction, and co-reference resolution. In this paper, we present a universal morphological feature schema, which is a set of features that represent the fin...

متن کامل

Representing Lexical Knowledge for Bulgarian Inflectional Morphology in DATR

The paper analyses the application of DATR language for lexical knowledge presentation for interpreting Bulgarian inflectional morphology. It discuss the semantic network of the feature of definiteness in Bulgarian language and compares the lexical knowledge representation for the different part-of-speech with respect to the defined grammar rules, the sound alternations, the related formal pres...

متن کامل

Recognition and Generation of word form for natural language understanding systems: Integrating two-level morphology and feature unification

A language-independent morphological component for the recognition and generation of word forms is presented. Based on a lexicon of morphs, the approach combines two-level morphology and a feature-based unification grammar describing word formation. To overcome the heavy use of diacritics, feature structures are associated with the two-level rules. These feature structures function as filters f...

متن کامل

Discriminative n-gram language modeling for Turkish

In this paper Discriminative Language Models (DLMs) are applied to the Turkish Broadcast News transcription task. Turkish presents a challenge to Automatic Speech Recognition (ASR) systems due to its rich morphology. Therefore, in addition to word n-gram features, morphology based features like root n-grams and inflectional group n-grams are incorporated into DLMs in order to improve the langua...

متن کامل

Towards Unsupervised and Language-independent Compound Splitting using Inflectional Morphological Transformations

In this paper, we address the task of languageindependent, knowledge-lean and unsupervised compound splitting, which is an essential component for many natural language processing tasks such as machine translation. Previous methods on statistical compound splitting either include language-specific knowledge (e.g., linking elements) or rely on parallel data, which results in limited applicabilit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015